Skip to content

Conversation

@roystgnr
Copy link
Member

If we add only a subset of tests to the runner (via --re or --deny_re options), we're careful to move the other tests to a "rejects" suite so they'll get deleted there, but the "supertest", the suite holding all the other tests, wasn't getting deleted in those cases.

This (on top of my earlier fixes, and using the MOOSE suppressions file for third-party library issues) gets selective valgrind runs of our unit tests clean for me. (all-tests runs are clean either way)

This isn't urgent to merge; I still need to run through our examples to look for valgrind issues too. This tiny leak is only a problem because it upsets valgrind, and until we're sure we're ready to make that be a big deal (add a --error-exitcode= option to our valgrind recipes), "do our unit tests upset valgrind" isn't a critical question.

@roystgnr
Copy link
Member Author

Scratch "gets selective valgrind runs of our unit tests clean for me" - I'm still seeing something else, and I'm not sure if it's just something I missed before or a regression from this PR.

Copy link
Member

@jwpeterson jwpeterson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some questions that could probably be cleared up if I read the cppunit docs, but I chose not to 🤷‍♂️

tests/driver.C Outdated
runner, rejects);
if (n_tests_added >= 0)
libMesh::out << "--- Running " << n_tests_added << " tests in total." << std::endl;
if (n_tests_added != -12345)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but since this is our own magic number that we now actually have to refer to, it might make sense to use libMesh::invalid_uint or some other named constant.

if (n_tests_added >= 0)
libMesh::out << "--- Running " << n_tests_added << " tests in total." << std::endl;
if (n_tests_added != -12345)
owned_suite.reset(suite);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't understand why we don't need to clean up suite when there's no tests added? From a surface level reading of the code it just looks like registry.makeTest() returns a dumb pointer whose lifetime we are expected to manage, and we were always leaking it before...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should clean up suite when there's no tests added. That'll return n_tests_added == 0, and 0 != -12345, so we put suite in our unique_ptr and it gets cleaned up.

@moosebuild
Copy link

moosebuild commented Sep 10, 2025

Job Coverage, step Generate coverage on 76de8d1 wanted to post the following:

Coverage

c144a6 #4249 76de8d
Total Total +/- New
Rate 65.26% 65.26% -0.01% -
Hits 77376 77370 -6 0
Misses 41184 41190 +6 0

Diff coverage report

Full coverage report

This comment will be updated on new commits.

@roystgnr
Copy link
Member Author

There's definitely something weird going on here. This patch cleans up the leaked suite from TestFactoryRegistry::makeTest(), but it starts complaining about a test suite destructor hitting invalid accesses to already-freed memory from TestSuiteFactory<AllSecondOrderTest>::makeTest() in particular.

If every test class was complaining then I'd be sure I'm trying to do a double-free here, once from the runner and then once from recursive destruction from the suite, but it's just AllSecondOrderTest complaining?

No, wait, it's too much of a coincidence that the failing AllSecondOrderTest is first alphabetically. I must be trying to do a double-free here, but something about the first UB causes the destructor to skip the rest.

If we add only a subset of tests to the runner (via --re or --deny_re
options), we're careful to move the other tests to a "rejects" suite so
they'll get deleted there, but the "supertest", the suite holding all
the other tests, wasn't getting deleted in those cases.

This (on top of my earlier fixes, and using the MOOSE suppressions file
for third-party library issues) gets selective valgrind runs of our unit
tests clean for me.
@roystgnr roystgnr force-pushed the dont_leak_test_suite branch from f452222 to 0d0d8d1 Compare November 24, 2025 22:08
@roystgnr
Copy link
Member Author

I think I've managed to square the circle of "we can't add a test to a runner without passing ownership of the pointer to the runner" and "when we get a test from a suite there's no way to take ownership of it away from the suite": we get each test from the suite, we wrap it in our own TestShim that passes along every API call except the destructor, and we hand the shim to the runner. The runner deletes the shims, then we can delete the suite to delete the non-shimmed tests, and nothing gets destroyed 2 times or 0 times.

I've added our Valgrind (unit tests) recipe, and I'm testing with valgrind myself; if nothing screams then this will finally be ready to merge.

@roystgnr roystgnr merged commit e21a264 into libMesh:devel Nov 27, 2025
22 checks passed
@roystgnr roystgnr deleted the dont_leak_test_suite branch November 27, 2025 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants